IS

Provost, Foster

Topic Weight Topic Terms
0.405 data classification statistical regression mining models neural methods using analysis techniques performance predictive networks accuracy
0.259 mobile telecommunications devices wireless application computing physical voice phones purchases ubiquitous applications conceptualization secure pervasive
0.202 differences analysis different similar study findings based significant highly groups popular samples comparison similarities non-is
0.198 approach analysis application approaches new used paper methodology simulation traditional techniques systems process based using
0.166 theory theories theoretical paper new understanding work practical explain empirical contribution phenomenon literature second implications
0.154 explanations explanation bias use kbs biases facilities cognitive making judgment decisions likely decision important prior
0.149 decision making decisions decision-making makers use quality improve performance managers process better results time managerial
0.142 office document documents retrieval automation word concept clustering text based automated created individual functions major
0.111 network networks social analysis ties structure p2p exchange externalities individual impact peer-to-peer structural growth centrality
0.101 set approach algorithm optimal used develop results use simulation experiments algorithms demonstrate proposed optimization present

Focal Researcher     Coauthors of Focal Researcher (1st degree)     Coauthors of Coauthors (2nd degree)

Note: click on a node to go to a researcher's profile page. Drag a node to reallocate. Number on the edge is the number of co-authorships.

Martens, David 2 Murray, Alan 1 Saar-Tsechansky, Maytal 1
active learning 1 analytical modeling 1 classifier induction 1 comprehensibility 1
decision making 1 decision-support systems 1 Document classification 1 design science 1
instance level explanation 1 mobile computing 1 network analysis 1 predictive modeling 1
text mining 1

Articles (3)

Finding Similar Mobile Consumers with a Privacy-Friendly Geosocial Design (Information Systems Research, 2015)
Authors: Abstract:
    This paper focuses on finding the same and similar users based on location-visitation data in a mobile environment. We propose a new design that uses consumer-location data from mobile devices (smartphones, smart pads, laptops, etc.) to build a Ògeosimilarity networkÓ among users. The geosimilarity network (GSN) could be used for a variety of analytics-driven applications, such as targeting advertisements to the same user on different devices or to users with similar tastes, and to improve online interactions by selecting users with similar tastes. The basic idea is that two devices are similar, and thereby connected in the GSN, when they share at least one visited location. They are more similar as they visit more shared locations and as the locations they share are visited by fewer people. This paper first introduces the main ideas and ties them to theory and related work. It next introduces a specific design for selecting entities with similar location distributions, the results of which are shown using real mobile location data across seven ad exchanges. We focus on two high-level questions: (1) Does geosimilarity allow us to find different entities corresponding to the same individual, for example, as seen through different bidding systems? And (2) do entities linked by similarities in local mobile behavior show similar interests, as measured by visits to particular publishers? The results show positive results for both. Specifically, for (1), even with the data sample's limited observability, 70%Ð80% of the time the same individual is connected to herself in the GSN. For (2), the GSN neighbors of visitors to a wide variety of publishers are substantially more likely also to visit those same publishers. Highly similar GSN neighbors show very substantial lift.
Explaining Data-Driven Document Classifications (MIS Quarterly, 2014)
Authors: Abstract:
    Many document classification applications require human understanding of the reasons for data-driven classification decisions by managers, client-facing employees, and the technical team. Predictive models treat documents as data to be classified, and document data are characterized by very high dimensionality, often with tens of thousands to millions of variables (words). Unfortunately, due to the high dimensionality, understanding the decisions made by document classifiers is very difficult. This paper begins by extending the most relevant prior theoretical model of explanations for intelligent systems to account for some missing elements. The main theoretical contribution is the definition of a new sort of explanation as a minimal set of words (terms, generally), such that removing all words within this set from the document changes the predicted class from the class of interest. We present an algorithm to find such explanations, as well as a framework to assess such an algorithm’s performance. We demonstrate the value of the new approach with a case study from a real-world document classification task: classifying web pages as containing objectionable content, with the goal of allowing advertisers to choose not to have their ads appear on those pages. A second empirical demonstration on news-story topic classification shows the explanations to be concise and document-specific, and to be capable of providing understanding of the exact reasons for the classification decisions, of the workings of the classification models, and of the business application itself. We also illustrate how explaining the classifications of documents can help to improve data quality and model performance.
Decision-Centric Active Learning of Binary-Outcome Models. (Information Systems Research, 2007)
Authors: Abstract:
    It can be expensive to acquire the data required for businesses to employ data-driven predictive modeling--for example, to model consumer preferences to optimize targeting. Prior research has introduced "active-learning" policies for identifying data that are particularly useful for model induction, with the goal of decreasing the statistical error for a given acquisition cost (error-centric approaches). However, predictive models are used as part of a decision-making process, and costly improvements in model accuracy do not always result in better decisions. This paper introduces a new approach for active data acquisition that specifically targets decision making. The new decision-centric approach departs from traditional active learning by placing emphasis on acquisitions that are more likely to affect decision making. We describe two different types of decision-centric techniques. Next, using direct-marketing data, we compare various data-acquisition techniques. We demonstrate that strategies for reducing statistical error can be wasteful in a decision-making context, and show that one decision-centric technique in particular can improve targeting decisions significantly. We also show that this method is robust in the face of decreasing quality of utility estimations, eventually converging to uniform random sampling, and that it can be extended to situations where different data acquisitions have different costs. The results suggest that businesses should consider modifying their strategies for acquiring information through normal business transactions. For example, a firm such as Amazon.com that models consumer preferences for customized marketing may accelerate learning by proactively offering recommendations--not merely to induce immediate sales, but for improving recommendations in the future.